Collocation Map for Overcoming Data Sparseness

نویسندگان

  • Moonjoo Kim
  • Young S. Han
  • Key-Sun Choi
چکیده

Statistical language models are useful because they can provide probabilistic information upon uncertain decision making. The most common statistic is n-grams measuring word cooccurrences in texts. The method suffers from data shortage problem, however. In this paper, we suggest Bayesian networks be used in approximating the statistics of insufficient occurrences and of those that do not occur in the sample texts with graceful degradation. Collocation map is a sigmoid belief network that can be constructed from bigrams. We compared the conditional probabilities and mutual information computed from bigrams and Collocation map. The results show that the variance of the values from Collocation map is smaller than that from frequency measure for the infrequent pairs by 48%. The predictive power of Collocation map for arbitrary associations not observed from sample texts is also demonstrated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Coupling Map: Acceleration Space Analysis for Underactuated Robots

Humans and animals are capable of overcoming complex terrain challenges with graceful and agile movements. One of the key ingredients for such complex behaviors is motion coordination to exploit passive dynamics. We present a direct collocation trajectory optimization to find optimal control policy and generate optimal trajectory for the swing up motion of a gymnast on high bar. Notwithstanding...

متن کامل

SOLVING SINGULAR ODES IN UNBOUNDED DOMAINS WITH SINC-COLLOCATION METHOD

Spectral approximations for ODEs in unbounded domains have only received limited attention. In many applicable problems, singular initial value problems arise. In solving these problems, most of numerical methods have difficulties and often could not pass the singular point successfully. In this paper, we apply the sinc-collocation method for solving singular initial value problems. The ability...

متن کامل

Target Word Selection Using WordNet and Data-Driven Models in Machine Translation

Collocation information plays an important role in target word selection of machine translation. However, a collocation dictionary fulfills only a limited portion of selection operation because of data sparseness. To resolve the sparseness problem, we proposed a new methodology that selects target words after determining an appropriate collocation class by using a inter-word semantic similarity...

متن کامل

Collocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor

Collocations, the combination of specific words are quite useful linguistic resources for NLP in general. The purpose of this paper is to show their usefulness, exemplifying an application to Kanji character decision processes for Japanese word processors. Unlike recent trials of automatic extraction, our collocations were collected manually through many years of intensive investigation of corp...

متن کامل

The Application of Fuzzy Logic to Collocation Extraction

Collocations are important for many tasks of Natural language processing such as information retrieval, machine translation, computational lexicography etc. So far many statistical methods have been used for collocation extraction. Almost all the methods form a classical crisp set of collocation. We propose a fuzzy logic approach of collocation extraction to form a fuzzy set of collocations in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995